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Abstract 

K.A.S. Immink and J.H. Weber recently defined and studied a 
channel with both gain and offset mismatch, modelling the behaviour 
of charge-leakage in flash memory. They proposed a decoding measure 
for this channel based on minimising Pearson distance (a notion from 
cluster analysis). The paper derives a formula for maximum likelihood 
decoding for this channel, and also defines and justifies a notion of 
minimum distance of a code in this context. 


1 Introduction 


We begin by defining some notation. Let n be an integer, n > 3. All our 
vectors will have length n, and will have entries in the real numbers M. For 
a vector x, we write Xi for the ith entry of x, we write 


X = 


1 

n 


i=l 


for the mean of x and we write 


o-x = 


n 




- X)2 


1 



for the (unnormalised) standard deviation of x. We write 1 for the all- 
one vector of length n, and call any scalar multiple of 1 a constant vector. 
For vectors u and v that are not constant vectors, the Pearson correlation 
coefficient ^ is dehned by 

ru,v 

CruCTv 

Finally, the Pearson distance ^pearson(u, v) between vectors u and v is dehned 
to be 

*^Pearson (U) 1 Pu,v 

Since pu,v lies between —1 and 1, the Pearson distance lies between 0 and 2. 
Both Pearson distance and Pearson correlation are well-known concepts in 
the area of cluster analysis. 

The channel considered by Kees A. Schouhamer Immink and Jos H. We¬ 
ber [3] is dehned as follows. If the vector x is sent through the channel, the 
channel outputs the received vector r where 

r = a(x + jy) + bl. 

Here a (the gain) and b (the offset) are unknown real numbers, with a > 0, 
and 

U = (z/i,I/2,...,Z/„) 

where the z/j are independently normally distributed with mean 0 and stan¬ 
dard deviation a. 

The channel is motivated by the properties of hash memory. We give some 
basic details of this setting here; see Bi for more detailed introductions, 
and see (for example) |6l [TJ E] for another approach to modelling the problem 
using rank modulation codes. Flash memory is made up of an array of 
hoating-gate transistors, known as hash cells. Data is stored in each cell by 
varying the charge (equivalently, the voltage) on the cell. In single level cell 
(SLC) hash memory, each cell stores one bit of information depending on 
whether the voltage level is zero or non-zero. In more recent multi-level cell 
(MFC) systems, more information is stored by allowing the cell to be charged 
at one of several discrete non-zero voltage levels. The vector x corresponds 
to the voltages we wish to store in a block of n cells, so Xi is the voltage 
we wish to store in the Ah cell. We cannot hope to initialise a cell with the 
exact voltage we wish: the errors in this process give rise to the error term i/. 
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Over time, the voltage in each cell drops due to charge leakage. We assume 
that the function that gives this voltage change is unknown, but is affine and 
is independent of which cell in the block we are examining. The unknown 
coefficients a and b specify this function; the coefficient a is positive since 
charge leakage is monotonic increasing: the more charge we have initially, 
the more we have after leakage. The received vector r thus models the set 
of voltages we retrieve from a block of cells we have initialised with voltages 
corresponding to x. 

We note that the channel does not model some aspects of flash memory: 
intercell coupling (where the charge on one cell influences the charge on 
neighbouring cells) is not modelled in any way; nor is the possibility that the 
magnitude of the error in the charging process depends on the charge in some 
way. Nevertheless, the channel is very natural and captures key properties 
of the process of retrieving data from flash memory. 

Immink and Weber assume that the vectors x lie in some hnite subset 
C of M”. (In fact, they assume that C C {0,1,... ,g — !}"■ for some hxed 
integer q.) This corresponds to the fact that we initialise each cell with one of 
a hnite discrete set of voltages. To ensure unique decoding in the absence of 
noise, they assume that if x G C then no other codeword y G C* has the form 
y = ax + 61 for real numbers a and b with a positive. They also assume that 
no constant vector lies in C. This makes the Pearson distance between any 
pair of vectors in C well-dehned; see Section for additional motivation for 
this assumption. Weber, Immink and Blackburn [5] have studied maximal 
codes C C {0,1,..., g — !}"■ with these properties. 

A decoder based on Pearson distance is proposed in this setting in |3]. 
So we decode a received vector r as x, where x G C minimises 5pearson(i’5 k). 
One motivation for this choice is that Pearson distance behaves well with 
respect to an affine charge-leakage function, since 

^Pearson(r, x) = 5pearson(ar + bl,aii + bl). 

Pearson distance has a natural geometric meaning: see Section for a brief 
discussion. 

In this paper, we derive a maximum likelihood decoding function for the 
channel in p], and compare a decoder based on this function with a decoder 
based on minimising Pearson distance. We also propose and justify a notion 
of minimum distance for codes used with this channel. 

We should emphasise that the model makes no assumptions on the distri¬ 
bution of the unknown (‘nuisance’) parameters a and b: if we know something 
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about these distributions, other decoding methods might be appropriate. For 
example, if a is known to be very close to 1, then decoding based on min¬ 
imising Euclidean distance is sensible; Immink and Weber [1] have proposed 
a decoder based on minimising a weighted sum of Euclidean and Pearson 
distances in some situations. 

The remainder of the paper is structured as follows. Section sets up 
notation, and contains some preliminary lemmas. In Section we show 
how to achieve Maximum Likelihood Decoding for this channel. Pearson 
distance is not the measure to use for Maximum Likelihood decoding, but is 
often a good approximation to it: simulations show comparable performance 
between both MLD and Pearson decoders. Section dehnes and justihes a 
minimum distance measure for codes designed for the channel. In Section 
we give some results of simulations that compare the approach in [3] with 
the one taken here. Finally, Section provides some comments on various 
aspects of the model in |3]. 

2 Preliminaries 

This section contains notation that will be used in the remainder of this 
paper. Some simple facts, which will often be used without further comment, 
are also stated. 

We dehne | |u| | to be the Euclidean length of u G M"', and we dehne 5(u, v) 
to be the Euclidean distance between u, v G M"'. 

Dehne the subspace Z of by 

Z = {x G M"" : X = 0} 

n 

(xi, X2,..., x„) G M” : ^ Xi = 0 

i=l 

Let C : M" —)■ Z be dehned by 




C(x) = x-xl. 

We can think of C as a ‘normalisation’, applying an offset to a vector so that 
it has mean zero. Using ( allows the formulas given in the introduction to be 
expressed in a more geometric way. We now give more details. We see that 

^U = ||C(U)||. (1) 
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We write (x, y) for the standard inner prodnct (the dot prodnct) of x and 
y. So 

n 

(x,y) = 

i=l 

Since (x, y) = ||x|| ||y|| cos6' where 6 is the angle between x and y, we see 
that 


(C(u),C(v)) 

ru,v 

W(u)W(v) 

_ IIC(u)ll IIC(v)l|cos^ 

llC(u)||||C(v)|| 

= cos 6*, 


( 2 ) 

(3) 


where 9 is the angle between C(u) and C(v)- 

Finally, we note that C(C(u)) = C(u), that C(u + v) = ^(u) + (C(v) and 
that ~ ^u* 


3 Maximum likelihood decoding 


This section provides a proof of the following theorem: 

Theorem 1. A maximum likelihood decoder decodes a received vector r to 
the codeword x which minimises f'r(x), where 


(^1(1 - Pr,x) pr.x > 0, 

cr? otherwise. 


(4) 


Before proving this theorem, we provide a geometrical interpretation for 
the formula (|^. For a non-zero vector r G M", dehne 

Ur = {a'r -|- 6'1 I a', b' G M}, and 
U+ = {a'r + b'l I a', b' G M, a' > 0}. 


So Ur is a subspace, and is a half-subspace, of M"'. For a vector r G M” 
we write Rr for the ray from the origin in the direction of r, so 


Rr = {a'r : a' G M, a' > 0}. 
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Lemma 2. Let r and x be vectors in M"'. Let di be the Euclidean distance 
between x and . Let d 2 be the Euclidean distance between C(x) and -Rf(r)- 
Then 

dl = dl= 4(x). 

Proof. We start by proving that di = d 2 . Let u = a'C(r) = a'r —a'fl G Rc,(r)- 
Then 

5(C(x), u) = (5(x — xl, u) = 5(x, u + xl), 

and u + xl = a'r + (x — f)l G Lff'. So di < d 2 . 

Let u = a'r + b'l = a'C(r) + (6' + r)l G Uf'. Then 

5(x,u) = 5(C(x),u-xl) = (5(C(x),a'C(r) + (6' + r- x)l) > 5(C(x), a'C(r)), 


since C(x),a'C(r) G Z and since 1 is orthogonal to Z. Since a'C(r) G R^ir), 
we see that d 2 < d\. Hence di = d 2 . 

We now prove that = ^r(x). There are two cases, depending on whether 
or not the closest point P to C(x) on the line generated by C(r) lies in the 
ray R({r)'- see Figure [Tj The hrst case, when P lies on the ray, happens if 
and only if (('(x), (^(r)) > 0. This happens exactly when > 0, by (|^. In 
this case. 


dl = ||C(x)irsin2 0 = ||C(*)ir(l - cos^^) = a|(l - p^, 

where 6 is the angle between C(x) and C(i')) by ([^ and ([^. In the second 
case, when ((^(x),^(r)) < 0, the distance between (^(x) and the ray R(^{r) is 
given by the distance from (^(x) to the origin. So 

4 = iic(x)ir=4, 


by 0. This establishes the lemma. □ 

Proof of Theorem Since the components of u are picked independently 
according to a normal distribution with mean 0 and standard deviation a, 
each value of u is associated with the value of the corresponding normal 
Probability Density Function f{iy), where 


1 

ay'zTT 

2=1 


exp(-z/f/(2a2)). 
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Figure 1: The distance of a point to a ray: two cases 


For vectors r and x, define 


La,fe(x I r) = /((r - 61)/a - x). 

This is the likelihood of x given r when a and b are hxed, since ly = {r — 
bl)/a — X in this case. 

In maximum likelihood decoding, we decode a received vector r = a(x + 
I/) + 61 as X G C, where x is the codeword that maximises 


max fe(x I r) = max f((r — 61)/a — x) 

a,feeIR,a>0 ' ’ a,beK,a>0 " ' 

= max f (a'r + 6'1 — x), 
a',6'eK,a'>0 


where a' = 1/a and b' = 6/a. The logarithm function is strictly increasing on 
the positive real numbers, and / is a positive function. So equivalently we 
want to hnd x G (F that maximises maXa/^fe/gK^a/>o log/(a'r + 6'1 — x). But 


log/(a'r + 6'1 — x) = —n log(o-\/27r) 


2ct2 


^(aVi + 6' 


2 = 1 



Since —n\og{a\/2'K) is a constant (in other words, independent of x and 
r), and since ^ is a positive constant, we see that a maximum likelihood 
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decoder finds a codeword x that minimises 


min > (a'vi + b' — XiY, 
a',fe'eR,a'>0 

i=l 

which is the square of the Euclidean distance between [/+ and x. But, 
by Lemma 1^ this is exactly the same as minimising the function ^r(x), as 
required. □ 

We describe techniques to reduce the amount of computation the max¬ 
imum likelihood decoder needs. Firstly, the value a? can be precomputed 
for all codewords x G C. Secondly, for codes such as 2-constrained codes |3] 
that are preserved under permuting their cooordinates, we can significantly 
reduce the number of codewords we need to consider by making the following 
observations. The value of is not changed if we permute the coordinates of 
X, and the value of pr,x is maximised when we permute the coordinates of x to 
have the same order as the coordinates of r. So we may use the ‘composition 
code’ decomposition technique from [31 Section IV.B] to decode more effi¬ 
ciently, only storing codewords that are in sorted order. Finally, we observe 
that 2-constrained codes C have the property that whenever x G C then its 
complement y = (g — 1)1 — x also lies in C. We note that C(x) = — C(y) and 
so we find that ctx = <Jy and pr,x = —Pr,y for any non-constant received word 
r. So for codes which are closed under taking complements, we only need 
to store one codeword from each pair {x, y}. If we do this, we search for a 
codeword x that minimises ct|( 1 — Prx)j then decode to the complement 
of X when pr,x < 0 and decode to x otherwise. This technique can be com¬ 
bined with the composition code technique above. The technique can also 
be used with the decoder in [3]: here we find a codeword maximising |pr,x|; 
and decode to this codeword if pr,x > 0 or to its complement otherwise. 


4 The distance between codewords 

For codewords u, y G (T, we define a (squared) distance measure (5'(u, v) by 

v) - I ~ p1v)/(^u+v when > - min{(Tv/cru, 

’ [ min{(T„, a^} otherwise. 


Note that (5'(u, v) = (5'(v,u). Also note that 5'(u, v) depends only on (^(u) 
and C(v), by Q and Finally, we claim that 

alalil - Pu,v)/f^u+v < min{a^, al}. (5) 


To see this, we may verify by routine calculation that 

(C(u), C(u))(C(v),C(v))-(C(u),C(v))^ 

(C(u + v),C(u + v)) 

(C(u),C(u) + C(v)))2 


c^X(i-p1v)K+v = 


(C(u),C(u)) + + 

< (C(u),C(u)) = al. 


( 6 ) 


A similar calculation shows that the left hand side of ([^ is at most a^, and 
so the claim follows. 

In this section, we will give a geometric interpretation for 5'(u, v), and 
we relate the minimum distance of a code (using this notion of distance) to 
the error rate of a maximum likelihood decoder. 

We note that [3] dehnes a different distance measure (namely the distance 
(i 2 (u, v) = 2ct^(1 — Pu,v), which is not symmetrical in u and v) to be used 
to calculate the minimum distance of a code in this context. This distance 
measure is natural for the decoder in [3]; in Section]^ we briefly compare this 
measure with the measure above. 


Lemma 3. Let x, y G M”'. Then 6' = 5'(x, y) is the largest real number 6' 
with the following property. Let i?(x, 5') be the ball in Z of radius and 
centre ((x) (using Euclidean distance). Let B(y, 6') be the ball in Z of radius 
y/W and centre C(y)- Then there is no ray R({r) that intersects the interior 
of both B{x, 5') and B{y,6'). 


Proof. Firstly, suppose that px,y > — niin{(jy/(Tx, cXx/cXy}. In particular this 
means that ((x) ^ — C(y)) so C(x + y) is a non-zero vector. 

The typical situation in this case is drawn in Figure Let P be the 
subplane of Z generated by C(x) and C(y )5 and let K be the subplane of 
vectors orthogonal to P. Let H be the hyperplane in Z generated by (C(x + y) 
and K. We have that ((x) and ((y) lie on different sides of ff. The closest 
point in H to ((x) lies in P, and so lies on the line generated by ((x) + C(y); 
the same is true for the closest point to C(y)- Setting 0 to be the angle 
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Figure 2: A hyperplane in Z at equal distance from C(x) and C(y) 


between (^(x) and ((x) + C(y); we find that the squared distance between H 
and ((x) is 


||C(x)|psin20=||C(x)|p-||C(x)|pcos^0 

(C(x),C(x) + C(y))^ 


(C(X),C(X)) 

= -Px,y)/f^x+y by 

= y. 


So the interior of B(x,6') does not intersect H. Similarly, ((y) is also at 
distance 6' from if and so the interior of B{y,6') does not intersect H. So 
the interiors of B{x,6') and B{y,6') lie on different sides of a hyperplane, 
and therefore no ray from the origin intersects them both, as required. 

We now show that the value for 6' is optimal, by proving that the ray 
i?(^(x+y) touches the boundaries of both B(x,S') and B{y,S'). The nearest 
point to C(x) on the line generated by C(x + y) is given by 


IIC(x)||cosg 
llC(x + y)|| 


C(x + y) 


(C(x),C(x + y)) 

IIC(x + y)|P 


C(x + y). 
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Figure 3: A typical case when ||C(x)|| < ||C(y)ll 


So -R<^(x+y) touches -B(x, S') if and only if (C(x), C(x + y)) > 0. But 

(C(x),C(x + y)) = (C(x),C(x)) + (C(x),C(y)) 

= CTx + 0-x0-yPx,y by 0 and 0 

>al- axCTy(ax/ay) 

= 0 . 

So i?(^(x+y) touches B{'x,S'). The argument that i?^(x+y) touches B{y,S') is 
similar, and uses the fact that px,y > —o'y/o'x- This shows that our value for 
S' is optimal in this case. 

We now turn to the case when px,y < — min{(jy/(Jx, See Figure]^ 

for a typical situation. Without loss of generality, assume that cXx < cXy. So 
S' = cr^ and px,y < —crx/cXy. 

Let H = C(x)''‘, so 


H = {ue Z : (C(x),u) = 0} 

is the hyperplane in Z of all vectors that are orthogonal to x. Clearly the 
nearest point on H to ((x) is the origin, so ((x) is at distance ||C(x)|| = Ux 
from H. Moreover, all points u in the interior of B(x, S') have (C(x), u) > 0. 
Now let u be a point in the interior of B(y, S'). Then u = ^(y) + v, where 
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< VS' and so 


(C(x),u) = (C(x),C(y)) + (C(x),v) 

< (C(x),C(y)) + IIC(x)||v^ 

< IIC(x)|| ||C(y)l|Px,y + ||C(x)||cTx 

< a^ay(-ajay) + al 
= 0 . 

Thus all points in the interior of B{y,S') he on the opposite side of the 
hyperplane H to the points in the interior of So no ray from the 

origin can pass through both -B(x, 5') and -B(y, 5'), as required. Finally, it is 
easy to see that no larger value of 5' can have this property, for when 5' > 
we hnd that the origin is in the interior of i?(x, 5'), and so all rays from the 
origin (including, for example, Rc_(y)) pass through i?(x, 5'). □ 

Theorem 4. Let C (L ML he a finite set of non-constant codewords. Define 
the minimum distance 6' of C by 


6'= min 5hx, y). 

x,yec,x7^y 

The word error probability of a maximum likelihood decoder is bounded above 
by the probability that x^(n — 1) > d' , where x^(n — 1) is the chi-squared 
distribution with n — 1 degrees of freedom. 

Proof. When a codeword x is transmitted, the decoder receives a vector r = 
a(x+i/)+6 where a and b are unknown, and the components of v are normally 
distributed with mean 0 and standard deviation a. Let ei, e 2 ,..., e„ be an 
orthonormal basis for M", with ei,e 2 ,... ,e„_i spanning Z. We may write 
i/ = i/' + ce„ where 


i/' = u[ei + ^262 H-h z/(j_ie„_i 

and where the real numbers z/' and c are independent and normally dis¬ 
tributed with mean 0 and standard deviation a. Now is a chi- 

squared random variable with n — 1 degrees of freedom, so < S' with 

probability equal to the probability that x^(n — 1) > S 'Assume that 
< S'. It suffices to show that our maximum likelihood decoder returns 
the codeword x. 
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Figure 4: Word error rate when q = 2 and n = A 

Note that (^{u) = and so C(i') = a(C(x) + The ray Rc^{r) passes 
within a squared distance of | P from C(x), since ({x.) + 1 /' lies on this ray. 
So f'x(i') < II^IP < ■ Let X G C be such that x 7 ^ x. Lemma and the 

definition of 6' shows that i?<;(r) cannot intersect the interior of a ball in Z 
of radius 6' centred at C(x)- Hence ^x(r) > 6'. Thus a maximum likelihood 
decoder will correctly decode to x. □ 

5 Comparing the two decoders 

Simulations indicate that the decoder in [3j has a comparable performance 
with the maximum likelihood decoder when word error rate is considered. 
Figure|^shows the results of a simulation for the maximum likelihood decoder 
when q = 2 and n = 4 for a range of noise levels when a 1-constrained code |3] 
is used: the horizontal axis is the signal to noise ratio, defined as —20 log^o 
and the vertical axis is the word error rate. Each point was the result of 
10,000 trials with a = 1.07 and b = 0.07. Figure gives a similar situation 
when n = 12. In each figure, simulation results are plotted along with the 
error rate predicted by averaging the bound in Theorem over all subcodes 
of size 2. These parameters are chosen for direct comparison with Figure 5 
in [3]. 

Figure is a scatter plot of two notions of distance for 10,000 random 
vectors when q = 100 and n = 20: the distance 5'(u, v) dehned in Section]^ 
and the distance d 2 {u,\) defined in Section IV.B of [3] for the purposes of 
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- 1.5 



Figure 6: Two distances: 62 against 5' 


estimating word error rates. The figure shows a close to linear relationship 
between these two quantities for random vectors. Figure is a scatter plot of 
two likelihood functions (namely Pearson distance and the likelihood function 
f*x(y) used by the maximum likelihood decoder) for a similarly randomly 
generated collection of vectors. Again, a close to linear relationship can be 
observed, which provides an explanation for the similar performance of the 
corresponding decoders. 
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Figure 7: Two likelihood functions: Pearson distance against ^x(y) 

6 Comments 

6.1 A geometric meaning for Pearson distance 

Since the offset b is arbitrary and unknown, and changes the mean of a 
vector by b, it seems sensible to normalise codewords and received words to 
have mean 0. In other words, we consider the words C(x) = x — xl and 
C(r) = r — f 1 rather than x and r. Scaling a vector of mean 0 by a does 
not change the mean, but scales the standard deviation by a factor of a. So 
it seems sensible to scale our normalised vectors so that they have standard 
deviation 1: if our original vectors were non-constant, we can always hnd 
a scaling factor a that does this. The resulting vectors, (x — x1)/(Tx and 
(r — rl)/(Tr, lie on an n — 1-dimensional unit sphere, centred at the origin. A 
natural distance measure between two vectors u and v on this unit sphere is 
their squared Euclidean distance, and it is not difficult to show that this is 

exactly (5pearson(u, v). 

6.2 Why are constant codewords forbidden? 

The (unnormalised) standard deviation of a constant codeword is 0, so the 
Pearson correlation coefficient pr,x is not dehned when x is constant. But 
the forbidding of constant codewords is an artifact of the channel itself, not 
just the distance measure that is proposed for decoding. To see this, suppose 
that X = al is a codeword. Let r be a received word, and dehne s = r — x. 
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For any positive e G M we have 

e“^(x + es) + (1 — e~^)al = s + al = r. 

Setting a = 6 = (1 — e~^)a and i/ = es we have that r = a(x + u) + hi. 

But e may be taken to be arbitrarily small, and so we see r could have 
been received when x was transmitted, with an abitrarily small error vector 
u = es. So any reasonable decoder for this channel would decode every 
received vector to x, and a sensible distance measure would set the distance 
between x and any other vector as 0. 

6.3 Future work 

Weber, Immink and Blackburn [5] have studied optimal Pearson codes, which 
are the largest codes contained in { 0 , q — 1}"' that can be correctly decoded 
in the zero-error case (when a = 0, and so u = 0). It would be very in¬ 
teresting to fully explore the interplay between the error correcting capacity 
of codes when a > 0 and the rate of an optimal code. We hope that the 
distance between codewords that is dehned in Section will provide a tool 
to accomplish this. 
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